26 research outputs found

    Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

    Get PDF
    We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, and (ii) limited feedback, since only the influenced portion of the network is observed. Under a stochastic semi-bandit feedback, we propose and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our bounds on the cumulative regret are polynomial in all quantities of interest, achieve near-optimal dependence on the number of interactions and reflect the topology of the network and the activation probabilities of its edges, thereby giving insights on the problem complexity. To the best of our knowledge, these are the first such results. Our experiments show that in several representative graph topologies, the regret of IMLinUCB scales as suggested by our upper bounds. IMLinUCB permits linear generalization and thus is both statistically and computationally suitable for large-scale problems. Our experiments also show that IMLinUCB with linear generalization can lead to low regret in real-world online influence maximization.Comment: Compared with the previous version, this version has fixed a mistake. This version is also consistent with the NIPS camera-ready versio

    Influence Maximization with Bandits

    Full text link
    We consider the problem of \emph{influence maximization}, the problem of maximizing the number of people that become aware of a product by finding the `best' set of `seed' users to expose the product to. Most prior work on this topic assumes that we know the probability of each user influencing each other user, or we have data that lets us estimate these influences. However, this information is typically not initially available or is difficult to obtain. To avoid this assumption, we adopt a combinatorial multi-armed bandit paradigm that estimates the influence probabilities as we sequentially try different seed sets. We establish bounds on the performance of this procedure under the existing edge-level feedback as well as a novel and more realistic node-level feedback. Beyond our theoretical results, we describe a practical implementation and experimentally demonstrate its efficiency and effectiveness on four real datasets.Comment: 12 page

    Near-Optimal Sample Complexity Bounds for Constrained MDPs

    Full text link
    In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an ϵ\epsilon-optimal policy with probability 1δ1 - \delta, by making O~(SAlog(1/δ)(1γ)3ϵ2)\tilde{O}\left(\frac{S A \log(1/\delta)}{(1 - \gamma)^3 \epsilon^2}\right) queries to the generative model, thus matching the sample-complexity for unconstrained MDPs. For (ii), we show that the algorithm's sample complexity is upper-bounded by O~(SAlog(1/δ)(1γ)5ϵ2ζ2)\tilde{O} \left(\frac{S A \, \log(1/\delta)}{(1 - \gamma)^5 \, \epsilon^2 \zeta^2} \right) where ζ\zeta is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs. Our results show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.Comment: NeurIPS'2
    corecore